Custom-tailoring TTS voice font - keeping the naturalness when reducing database size
نویسندگان
چکیده
This paper presents a framework for custom-tailoring voice font in data-driven TTS systems. Three criteria for unit pruning, the prosodic outlier criterion, the importance criterion and the combination of the two, are proposed. The performance of voice fonts in different sizes which are pruned with the three criteria is evaluated by simulating speech synthesis over large amount of texts and estimating the naturalness with an objective measure at the same time. The result shows that the combined criterion performs the best among the three. The preestimated curve for naturalness vs. database size might be used as a reference for custom-tailoring voice font. The naturalness remains almost unchanged when 50% of instances are pruned off with the combined criterion.
منابع مشابه
Non - Uniform Granular Clustering Approach Pruning Synthesis Instances ?
The employment of non-uniform helps Corpus-based TTS synthesize high natural speech. But tailoring TTS voice font, or pruning redundant synthesis instances, usually results in loss of non-uniform. In order to solve this problem, this paper proposes the algorithm named NuClustering-VPA. According to this algorithm, the high-level non-uniforms containing same syllables are clustered to several ce...
متن کاملTTS Evaluation Campaign with a Common Spanish Database
This paper describes the first TTS evaluation campaign designed for Spanish. Seven research institutions took part in the evaluation campaign and developed a voice from a common speech database provided by the organisation. Each participating team had a period of seven weeks to generate a voice. Next, a set of sentences were released and each team had to synthesise them within a week period. Fi...
متن کاملBeyond the Listening Test: An Interactive Approach to TTS Evaluation
Traditionally, subjective text-to-speech (TTS) evaluation is performed through audio-only listening tests, where participants evaluate unrelated, context-free utterances. The ecological validity of these tests is questionable, as they do not represent real-world end-use scenarios. In this paper, we examine a novel approach to TTS evaluation in an imagined end-use, via a complex interaction with...
متن کاملLinear networks based speaker adaptation for speech synthesis
Speaker adaptation methods aim to create fair quality synthesis speech voice font for target speakers while only limited resources available. Recently, as deep neural networks based statistical parametric speech synthesis (SPSS) methods become dominant in SPSS TTS back-end modeling, speaker adaptation under the neural network based SPSS framework has also became an important task. In this paper...
متن کاملOn-line experimental methods to evaluate text-to-speech (TTS) synthesis: effects of voice gender and signal quality on intelligibility, naturalness and preference
Three experiments are reported that use new experimental methods for the evaluation of text-to-speech (TTS) synthesis from the user’s perspective. Experiment 1, using sentence stimuli, and Experiment 2, using discrete ‘‘call centre’’ word stimuli, investigated the effect of voice gender and signal quality on the intelligibility of three concatenative TTS synthesis systems. Accuracy and search t...
متن کامل